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© Parallel computer system using a SIMD method. 

© The parallel computer system comprises: a con- 
troller (10); a plurality of control groups (G1 to G4), 
each constituted by a number of processor elements 
(14) divided from a larger number thereof, for use as 
an address control unit; a plurality of scheduling 
circuits (110), each provided for one of the control 
groups (G1 to G4) and operatively connected to the 
controller (19), and for receiving and managing an 
event signal designating an address for data to be 
processed and transmitted from an adjacent control 



group; and a plurality of real address generation 
circuits (120) each of which is provided for one of 
the control groups (G1 to G4) and connected be- 
tween the controller (10), scheduling circuit (110) 
and control group, for generating an address signal 
for data to be processed by the processor element 
(14) belonging to the control group based on a base 
address determined by the event signal to be man- 
aged by the scheduling circuit (110), and an address 
signal applied from the controller (10). 
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0 Parallel computer system using a SIMD method. 

© The parallel computer system comprises: a con- 
troller (10); a plurality of control groups (G1 to G4), 
each constituted by a number of processor elements 
(14) divided from a larger number thereof, for use as 
an address control unit; a plurality of scheduling 
circuits (110), each provided for one of the control 
groups (G1 to G4) and operatively. connected to the 
controller (19), and for receiving and managing an ■ 
event signal designating an address for data to be 
processed and transmitted from an adjacent control 



group; and a plurality of real address generation 
circuits (120) each of which is provided for one of 
the control groups (G1 to G4) and connected be- 
tween the controller (10), scheduling circuit (110) 
and control group, for generating an address signal 
for data to be processed by the processor element 
(14) belonging to the control group based on a base 
address determined by the event signal to be man- 
aged by the scheduling circuit (110), and an address 
signal applied from the controller (10). 
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The present invention relates to a parallel com- 
puter system using a SIMD method constituted by 
a controller and a plurality of processor elements 
connected to each other in a lattice configuration. 

Parallel computer systems are widely used, 
particularly, in the field of CAD (Computer Aided 
Design) which necessitates high speed calculation 
for a LSI (large scale integrated) circuit design. 
Accordingly, it is necessary to improve techniques 
to make these processor elements operate more 
efficiently in accordance with requirements of high 
density and high speed LSI. 

There are two types of parallel computers 
based on the connection configuration between the 
processor elements and the controller. One method 
is called an MIMD (multiple instruction stream mul- 
tiple data stream) method which is constituted by a 
plurality of processor elements and controllers. In 
this method, each of the processor elements is 
connected to a corresponding controller, respec- 
tively. Accordingly, it is necessary to provide the 
same number of controllers as there are proces- 
sors. However, it is difficult to constitute a large 
scale parallel computer system using this method 
because a large number of controllers are neces- 
sary in accordance with the number of processors, 
which can be from several tens to several hun- 
dreds of processors. 

The other method is called an SIMD (single 
instruction stream multiple data stream) method 
which is constituted by a plurality of processor 
elements and one controller. In this method, the 
controller is connected in parallel to all processor 
elements. Accordingly, it is possible to constitute a 
large scale parallel computer which has a large 
number of processor elements, for example, tens 
of thousands of processors. 

In the latter method, a "Connection Machine" 
made by Thinking Machines Corporation uses the 
SIMD method. This system is constituted by sev- 
eral tens of thousands of processor elements. 

A parallel computer system according to the 
present invention can control all processor ele- 
ments so as to effectively and uniformly distribute 
the processor elements as a load. 

Embodiments of the present invention may 
provide a parallel computer system using a SIMD 
method enabling high efficiency data processing 
and high load distribution capability. 

According to the present invention, there is 
provided a parallel computer system using a SIMD 
method constituted by a controller and a plurality 
of processor elements, each of the processor ele- 
ments having a storage unit to store data to be 
processed, the controller controlling operation of 
the processor elements, and the parallel computer 
system performing processing of data based on a 
calculation control signal transmitted from the con- 



troller, the parallel computer system comprising: a 
plurality of control groups, each control group be- 
ing constituted by a number of processor elements 
divided from a plurality of processor elements, to 

5 be utilized as an address control unit; a plurality of 
scheduling circuits, with a scheduling circuit being 
provided for each control group and operatively 
connected to the controller, for receiving and man- 
aging an event signal designating an address sig- 

10 nal for data to be processed and transmitted from 
an adjacent control group; and a plurality of real 
address generation circuits with a real address 
generation circuit provided for each control group 
and operatively connected to the controller, the 

15 scheduling circuit and the control group, for gen- 
erating an address signal for data to be processed 
by a processor element belonging to the control 
group based on a base address determined by the 
event signal to be managed by the scheduling 

20 circuit and an address signal applied from the 
controller. 

Reference is made, by way of example, to the 
accompanying drawings in which: 

Fig.1 is a basic block diagram of a type of 
25 parallel computer system useful for understand- 
ing the invention; 

Fig.2 is one version of a parallel computer sys- 
tem shown in Fig.1; 

Fig.3 is a schematic block diagram of a proces- 
30 sor element shown in Figs. 1 and 2; 

Fig.4 is a basic block diagram of a type of 
parallel computer system embodying the 
present invention; 

Fig. 5 is a schematic block diagram of a proces- 
35 sor element shown in Fig.4; 

Fig. 6 is a view for explaining the concept of the 

computer system shown in Fig.4; 

Fig.7 is a view for explaining the division of a 

virtual area shown in Fig.6; 
40 Figs. 8A and 8B are views for explaining ad- 

dresses of memory spaces shown in Fig.6; 

Fig.9 is a view for explaining control groups 

shown in Fig.4; 

Fig.1 0 is a block diagram of control groups and 
45 peripheral circuits; 

Fig.1 1 is a block diagram for explaining a pseu- 
do processor element; 

Fig.1 2 is a detailed block diagram of a schedul- 
ing circuit shown in Fig.4; 

so Fig.1 3 is a detailed block diagram of an input 
circuit for the window number shown in Fig.1 2; 
Fig.1 4 is a detailed block diagram of a consecu- 
tiveness detection circuit shown in Fig.1 2; 
Fig.1 5 is a detailed block diagram of an even 

55 input circuit shown in Fig.1 2; 

Fig.1 6 is a logic table in an event interpretation 
circuit shown in Fig.1 2; 
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Fig.17 is a detailed block diagram of a FIFO 
circuit shown in Fig.12; 

Fig.18 is a detailed block diagram of a registra- 
tion flag circuit shown in Fig.12; 
Figs.19A to 19C are detailed block diagrams of 
an address calculation circuit shown in Fiq 12- 
and ' 

Fig.20 is a detailed block diagram of a real 
address generation circuit shown in Fig. 12. 
Figure 1 is a basic block diagram of one type 
of parallel computer system useful for understand- 
ing the invention, as background information. In 
Fig. 1. reference number 10 denotes a controller, 
11 a control memory for storing a micro-code in- 
cluding output control signals, and 12 a global data 
register for performing an input/output operation of 
the data processed or to be processed. The control 
memory 11 and the global data register 12 are 
provided in the controller 10. Reference number 13 
denotes a data collection circuit for collecting out- 
put data from processor elements (PE) 14. 15A to 
15D denote control registers CR constituting a cal- 
culation control circuit and connected to each other 
using a pipe-line method for applying various cal- 
culation control signals to the collection circuit 13. 
16A to 16D denote gathering logic units (GLU) 
constituting the collection circuit 13 and each con- 
stituted by a tree configuration. Reference number 
17 denotes a signal line for the calculation control 
signal to the GLU, 18 a signal line for controlling 
processor elements, and 19 a data line for broad- 
casting global data. 

Each of the processor elements comprises a 
data register for storing the data to be processed 
and an arithmetic logic unit ALU as shown in Fig. 3 
The arithmetic logic unit ALU calculates the data 
stored in the register in response to the order 
transmitted from the controller 10 through the siq- 
nal line 18. 

Each gathering logic unit GLU 16A to 16D 
collects the output data transmitted from the pro- 
cessor elements. The gathering logic units 16A to 
16D are connected in the form of a tree configura- 
tion having several stages. That is, in Fig. 1 the 
units 16A are the first stage, the units 16B are the 
second stage, and the unit 16D is the final stage 
The outputs of the processor elements 14 are input 
to the gathering logic units 16A. The resultant 
calculation data obtained in the GLU's 16A are 
output to the GLU's 16B. Similarly, the resultant 
data obtained in the GLU's 16B are output to the 
next stage. The final stage 16D gathers all resultant 
data obtained in the previous stages and the data 
calculated in the final stage 16D is output to the 
global data register 12 in the controller 10. 

Each of the calculation control registers CR 
15A to 15D are connected in series with each other 
using the pipe-line method. The number of regis- 



ters is equal to the number of stages in the gather- 
ing logic unit GLU. In this case, the calculation in 
each stage is performed in response to the calcula- 
tion control signals, for example, an "ADD" calcula- 
s tion signal, transmitted through the signal line 17. 
That is, when the calculation signal "ADD" is input 
to the first stage 16A, the calculation suggested by 
the calculation signal is performed in the first stage 
16A regarding the data output from the processor 
w elements. This calculation signal is transmitted to 
the next stage in response to the clock signal from 
the controller 10 and the same calculation sug- 
gested by the calculation signal is performed in the 
second stage 16B. The above calculation is per- 
rs formed using the pipe-line method. That is, when 
the first calculation signal "ADD" is input to the first 
stage, the next calculation signal, for example, 
"MAX" is input to the first stage. 

The synchronization of all processor elements 
20 is performed in accordance with a synchronization 
signal from the controller 10. The controller 10 
sends the synchronization signal to all processor 
elements through the control line 18 to output the 
value "1 " when each processor element completes 
25 the predetermined processing. At the same time 
the signal "AND" is transmitted to the control reg- 
ister 15A through the control line 17. 

When the calculation signal "AND" is set in the 
register 15A, the GLU 16A of the first stage per- 
30 forms an "AND" calculation regarding all output 
from the processor element in response to the first 
clock. The same "AND" calculation is performed in 
the GLU 16B of the next stage in response to the 
next clock. When the same "AND" calculation is 
35 performed in the GLU 16D of the final stage in 
response to the clock and the resultant data is the 
value "1", the controller 10 can recognize that all 
processor elements output the value "1 ". 

The essential processor element having the 
40 essential data is extracted as follows. A proper 
processor number is previously attached to each 
processor element. First, the controller 10 com- 
mands the essential processor element to output 
the proper number. Second, the controller 10 com- 
45 mands another processor element to output a suit- 
able signal, for example, the value "11—1" or 
"00--0". The controller 10 then sends the control 
signal "MAX" or "MIN" to the control register 15A 
Accordingly, the essential processor element can 
so be selected in response to "MAX" or "MIN" of the 
number in the collection circuit 13. In this case, a 
next essential processor element can be selected 
from the remaining processor elements excluding 
the first essential processor element in the same 
55 manner as the above. Accordingly, it is possible to 
use this circuit to select the priority order of use of 
a bus line. 
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Figure 2 illustrates one particular version of the 
system shown in Fig. 1. The same reference num- 
bers as used above indicate the same components. 
In Fig. 2, reference number 20 denotes a processor 
array the elements of which are connected to each 
other in a lattice configuration. As explained above, 
the processor array and the collection circuit are 
controlled by the control signals from the controller 
10. The control memory 11 in the controller 10 
comprises a plurality of control formats 1 to n. 

The controller 10 further comprises a sequenc- 
er 21 which determines the sequence for reading 
out the control information from the control mem- 
ory 11. The global data register 12 is a register for 
holding the data transmitted in common to all pro- 
cessor elements and to receive the output data 
from the collection circuit 13. 

Figure 3 is a schematic block of a processor 
element. In Fig. 3, reference number 30 denotes a 
data register for holding the data to be processed. 
31 denotes an arithmetic logic unit ALU for cal- 
culating the data stored in the register 30. The 
processor element 14 is controlled by the same 
control signal transmitted from the controller 10. 
This control signal includes an address of the data 
register 30 and an operation code for the arithmetic 
logic unit 31. The processor element 14 further 
comprises four ports, i.e., east port (E), west port 
(W), north port (N) and south port (S) for commu- 
nicating between adjacent processor elements. The 
processor element 14 further comprises an input 
terminal GT for inputting the data from the global 
data register 12, and a collection terminal CT for 
outputting the data. 

The processor element 14 is a one-bit type 
and the input/output operation to the data register 
30 is basically performed for each bit. Data larger 
than one bit is processed from the most significant 
bit (MSB) or the least significant bit (LSB) for each 
bit. 

Figure 4 is a basic block diagram showing a 
type of parallel computer system embodying the 
present invention. In Fig. 4, reference number 110 
denotes a scheduling circuit SC, 120 a real ad- 
dress generation circuit RAGC, and 150 a pseudo 
processor element. Further, G1 to G4 denote con- 
trol groups to be used as a control unit for acces- 
sing the address. Accordingly processor elements 
are divided into several control groups. The sched- 
uling circuit 110 and the real address generation 
circuit 120 are provided for each control group. 

The scheduling circuit 110 is a circuit for re- 
ceiving an event signal to designate the address 
and for managing the address designated by the 
event signal by using a queue. 

The real address generation circuit 120 is a 
circuit for generating a real address of the data to 
be processed by the processor element belonging 



to that control group. This generation is performed 
based on a base address determined by the event 
signal and an address signal applied from the 
controller 10. 

5 The pseudo processor element 1 50 is provided 

in the boundary portion of each control group. The 
pseudo processor element 150 has a function of 
sending the data corresponding to the address of 
the processor element when the processor element 

70 located to the boundary portion gives and takes the 
data between the adjacent processor elements be- 
longing to the adjacent control group. This circuit is 
provided to ensure consecutiveness between the 
processor elements. 

75 Figure 5 is a schematic block diagram of a 

processor element shown in Fig. 4. This drawing is 
the same as Figure 3 except that an external 
memory 200 is added between the data register 30 
and the real address generation circuit 120. The 

20 address of the external memory 200 is applied 
from the real address generation circuit 120 pro- 
vided in every control group. This type of parallel 
computer system embodying the invention mainly 
relates to the address control for the external mem- 

25 ory 200. 

Figure 6 is a view for explaining the concept of 
the type of system shown in Fig. 4. Reference 
number 301 denotes an actual processor element 
group, 302 a first memory space corresponding to 

30 the actual processor element group 301 , and 300 a 
second memory space (virtual area) corresponding 
to a virtual processor element group. Accordingly, 
the first memory space 302 coincides with an ob- 
ject area to be processed by the actual processor 

35 element group 301. In general, the object area to 
be processed (for example, a wire pattern area) 
coincides with the size of the actual processor 
element group. However, in this type of system the 
object area can be widened up to the second 

40 memory space. In this case, the actual processor 
element group 301 moves to the second memory 
space 300 so that it is possible to process data 
regarding the larger object area exceeding the first 
memory space. Therefore, although the virtual pro- 

45 cessor element group does not actually exist it is 
possible to obtain the same performance as the 
processor element group having the second mem- 
ory space 300 by moving the actual processor 
element group 301 . 

50 Figure 7 is a view for explaining division of the 

virtual area shown in Fig. 6. The second memory 
space 300 is divided into a plurality of windows (m 
x n window). Accordingly, one window corresponds 
to the first memory space 302 processed by the 

55 actual processor element group 301. The window 
number is attached to each window from 0 to nm- 
1, respectively. 
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Rgures 8A and SB are views for explaining 
addresses of memory spaces. In Fig. 8A, the exter- 
nal memory 200 of one processor element 14 is 
divided into sixteen memory spaces for the virtual 
processor element. That is, "0000" to "FFFF" are 
addresses for the external memory each having 
sixteen bits, while "000" to "FFF" are addresses 
for the virtual area each having twelve bits. Accord- 
ingly, one actual processor element functions as 
sixteen virtual processor elements. 

In Fig. 8B, the window number denotes the 
base address indicating the head of each memory 
space of the virtual PE (processor element) and 
constituted by eight bits "aaaa 0000" since a maxi- 
mum of 256 windows can be provided. Since the 
external memory 200 is divided into sixteen blocks 
m this embodiment, the lower four bits are set to 

kSSSI'u The virtual PE address "oooo 
bbbbbbbbbbbb" denotes the relative address of 
each memory space of the virtual PE. The virtual 
address is transmitted in common to all proces- 
sor elements from the controller 10. The virtual PE 
address has "0000" in the upper bits in accor- 
dance with the number of the window. As explained 
'n F,g. 8A, when the number of the window is 
sixteen, the virtual PE address is constituted by 
twelve bits. As shown in Fig. 8B, the real address 
aaaabbbbbbbbbbbb" having sixteen bits of the 
external memory 200 can be obtained by adding 
(or performing an OR operation) the base address 
and the virtual PE address. 

The processing of the data in the virtual PE is 
performed in such a way that the real PE sequen- 
tial y processes the corresponding data in the vir- 
tual memory space divided from the real external 
memory 200. In this case, as the simplest method 
there ,s a method in which the real PE always 
sequentially processes all virtual PE's including its 
own external memory. However, this method is not 
efficient because the virtual PE's in which the pro- 
cessing is not necessary are included. Accordingly 
the present invention selects the virtual PE's in 
which the processing is necessary so that the 
efficiency of the processing can be raised There- 
fore, the concept of the "event" is employed to 
realize this method in the invention. 

The event is started when the conditions to be 
processed to the virtual PE are realized. The virtual 
wn,ch received the event is handled as the 
object to be processed by the real PE. The control- 
ler determines the content of the event in accor- 
dance with a program. 

Figure 9 is a view for explaining control groups 
shown in Fig. 4. As shown in the drawing, the 
processor elements (PE) 14 are divided into the 
control groups G1, G2. -. For example, the PE's 
of 128 x 128 are divided into sixteen control groups 
til to G16 each having 32 x 32 PE's. 



Figure 10 is a block diagram of control groups 
and peripheral circuits. In Fig. 10, G1 to G16 are 
control groups, 110 (SC) is the scheduling circuit 
provided for each control group, and 120 is a real 
5 address generation circuit also provided for each 
control group. The scheduling circuit 110 receives 
the event from the PE and manages the virtual PE 
to be processed. The virtual PE number to be 
processed, i.e., the window number, is queued in 
io the scheduling circuit 110 and sequentially pro- 
cessed from the head of the queue. The schedul- 
ing circuit 110 sends the base address correspond- 
ing to the virtual PE to the real address generation 
circuit 120. Accordingly, the scheduling circuit 110 
?s performs the queueing and assigns the real PE. 

The real address generation circuit 120 gen- 
erates the real address based on the relative ad- 
dress of the virtual PE and the base address In 
this case, the relative address indicates a kind of 
20 control signal transmitted in common from the con- 
troller to all PE's, and the base address is deter- 
mined by the scheduling circuit 110. The real ad- 
dress is transmitted to the real PE's in each control 
group. 

25 The scheduling circuit 1 10 is connected to four 

adjacent scheduling circuits. Each input/output sio- 
nal is explained below. 



30 



Event signal (as input signal) 



This event signal is obtained by the OR logic 
among the event signals transmitted from all PE's 
(32 PE's in this embodiment) located on the 
boundary of the control group, and is used as the 
35 input signal. This signal is one bit for four direc- 
tions of E, W, N, and S. 

Window number signal (as input sinnai) 

4° The window number signal of the adjacent 

scheduling circuit 110 is input as the input signal 
The window number signal has eight bits as shown 
in Fig. 8B for four directions of E, W, N and S The 
scheduling circuit inputs the corresponding window 
45 number to the event signal when that event signal 
is activated, and performs the queuing. 

Self-event signal (as input signal) 

This signal is obtained by the OR logic among 
all event signals of the PE's included in its own 
control group, and has one bit. 

Window number signal ( as output signal) 

This signal is the window number signal output 
to the adjacent scheduling circuit 110, and has 
eight bits for four directions of E, W, N, and S. 
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Base address signal (as output signal) 

This signal is an output signal to the real ad- 
dress generation circuit 120 indicating the cor- 
responding address to the window number of the 
virtual PE read-out from the head of the queue. 

Various control signals (as input/output signal) 

These signals are output or input signals to or 
from the controller 10. For example, the control 
signal NEXT is a signal to indicator reading out a 
next virtual PE from the queue, and the control 
signal DIR is a signal to indicate the direction of 
the data flow in four directions E, W, N, and S. The 
control signal EMPTY is a signal to indicate va- 
cancy of the input signal, the clock signal, and the 
queue. 

Figure 1 1 is a block diagram for explaining the 
pseudo processor element (PE) shorn in Fig. 4. In 
Fig. 11, the boundary BD of the control group is 
provided between the processor elements 8A and 
8B. That is, the PE 14A is adjacent to the PE 14B. 
The pseudo PE (PS-PE) 150A is provided adjacent 
to the PE 14A, and the pseudo PE 150B is pro- 
vided adjacent to the PE 14B, respectively. 

The pseudo PE is provided for ensuring the 
consecutiveness of the processing between adja- 
cent control groups. This is because the adjacent 
control group can not receive the necessary value 
of the window when the object window between the 
adjacent control groups is different. Accordingly, as 
shown in Figs. 4 and 11, the pseudo PE is pro- 
vided to each end of the row of the PE's in each 
control group. Therefore, when the object window 
is consecutive between adjacent control groups, 
the pseudo PE's are not used and the PE 14A 
directly accesses the PE 14B by switching the 
selectors S1 and S2. 

When the PE 14A performs the read/write 
(R/W) operation to its own external memory 200A, 
the write data is simultaneously written to the exter- 
nal memory 200a belonging to the pseudo PE 
150A. When the PE 14A transmits the data to the 
PE 14B, the pseudo PE 150A read the data from 
the external memory 200a and transmits that data 
to the PE 14B through the selector S1 instead of 
the PE 14A. The address of the external memory 
200a is the window address of the PE 14B side. 
The same operation as the above is performed in 
case of the data transmission from the PE 14B to 
the PE 14A. Although this drawing shows the con- 
nection of one direction as the lattice of one dimen- 
sion, it is possible to connect two directions as a 
lattice of two dimensions. 

Figure 12 is a detailed block diagram of the 
scheduling circuit shown in Fig. 4 and Figures 13 
to 19 are detailed circuits of the diagram in Fig. 12. 



In Fig. 12, reference number 500 denotes an input 
circuit for the window number, 510 a registration 
table, 520 a consecutiveness detection circuit, 530 
an input circuit for the event, 540 an event inter- 
5 pretation circuit, 550 a first-in/first-out (FIFO) circuit, 
560 a registration flag circuit, 570 an address hold- 
ing circuit, and 580 an address calculation circuit. 
Further, R1 to R4 denote registers for the pipe-line 
control. 

io The input circuit 500 inputs the window number 

determined from the four adjacent directions E, W, 
N, and S, where DIR is the control signal for 
indicating the data flow. This circuit is shown in 
detail in Fig. 13. 

75 In Fig. 13, R10 denotes a register for holding 

the window numbers input from four directions E, 
W, N and S. S10 denotes a selector fob selecting 
the window number in response to the control 
signal DIR and outputting the selected window 

20 number having eight bits. 

The registration table 510 is a table for storing 
flags indicating whether or not the window number 
is registered. One bit is assigned to each window 
in a maximum of 256 windows. Accordingly, the 

25 window number from the input circuit 500 becomes 
the address in the table 510. Therefore, double 
registration of a window number is prevented by 
this method. 

The consecutiveness detection circuit 520 de- 

30 termines the consecutiveness between the present 
area an the adjacent area. The detailed circuit is 
shown in Fig. 14. 

In Fig. 14, COMP denotes a comparator, 600 
an encoder (ECD), OR an OR circuit, and S20 a 

35 selector (SEL). CE, CW, CN and CS denote regis- 
ters for storing the resultant data of the detection of 
the consecutiveness until the reset signal is input. 
The comparator COMP compares the upper bits of 
the address of its own control group with the win- 

40 dow number input from the input circuit 500. When 
the former coincides with the latter, the encoder 
600 outputs an enable signal in response to the 
direction control signal DIR. The enable signal is 
stored in the registers CE, CW, CN and CS as 

45 consecutiveness data and the consecutiveness 
data C-FLAG is output from the selector S20 in 
response to the control signal DIR. 

The event input circuit 530 receives the event 
signals from four directions. The detailed circuit is 

50 shown in Fig. 15. 

In Fig. 15, EVCLR denotes an event clear sig- 
nal to clear each register R. S30 denotes a selector 
circuit. The register R is cleared by the event clear 
signal EVCLR. When the event signal is loaded in 

55 the register R, the event signal is output from the 
selector S30 through the AND circuit. 

The event interpretation circuit 540 judges 
whether or not the queuing of the window number 
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should be performed, or whether or not the present 
address should be he.d. The detai.ed logic tabte fo 
determining the output from this circuit 540 Z 
shown in Fig. 16. 

In Fig. 16, T denotes an active state of the 
signals. The registration signal REG indicating the 

« 540 ^ t d0W nUmb6r iS OUt " ut f ™ K 
circurt 540 only when the output of the input circuit 

530 is active. The address holding signal AHS is 

output when the consecutiveness signal C-FLAG 

dr M V8nt Si9na ' are 3Ctive - Furthe '- ad- 
dress holding circuit is output when the self-event 
signal is active. 

The FIFO 550 stores the window number to be 
processed in accordance with the event signal. The 
deta.led circuit is shown in Fig. 1 7. 

In Fig. 17, MEM denotes a memory having the 
capacity of 8 x 256 bits, R40 to R43 registers sZ 
a selector, WCNT a write counter to output the 
write address, RCNT a read counter to output the 
read address, and COMP a comparator. When the 
registration signa. is set in the register R41 he 
window number stored in the register R40 is written 

WCNT inth indiCat6d 5y the write 
• thl =L mem0ry MEM ' Further > the intent of 
the address of the memory MEM is read out in 
response to the control signal NEXT through the 

hT ^ the AND circuit - a " d output 
through the register R43. When the comparato 
detects coincidence between the content of the 
write counter WCNT and the content of the read 
counter RCNT, a signa. EMPTY indicating the va 
cant state is output. 

H»t J,^ r c e9istration fla 9 circuit 560 is shown in 

code ITS ia Fi9 ' 18 - 700 denotes a " en- 
coder, and R a register. The direction of the regis- 
tered window number is stored in the register R 
after being encoded by the encoder 700 in accor 
dance with the direction control signal DIR 

The address calculation circuit 580 outputs the 
window number to be informed to the adjacent 
control group and the upper address bits used fo 
generation of the real address based on the win- 
dow number read out from the FIFO 550 The 
detailed circuit is shown in Figs. 19A to 19C 

^JY' 9 ' 19A ' in the bound ary of the window, the 
control group sends the window numbers (A + 1) 
and (A - 1) for the horizontal direction, and sends 
the wmdow numbers (A + B) and (A - B) for the 
normal direction, where B denotes the number of 
*e wmdow for the transverse direction when the 
virtual area ,s divided into the plural windows 

distinn J?" b ° Undary ° f the Win °"OW iS 

distinguished by the boundary marks (E. W N S> 
80 The value of each boundary mark is set by the 
controller 1 0 in the initial stage 

R80 ,n tn F R« 5 19C '- ALU den ° teS Ca,cu,ation *™it, 
R80 to R82 registers, and S80 to S82 selectors 



The calculation circuit ALU calculates any of the 
window numbers A, A ± 1, and A ± B in accor 

F?n nC ?op if V he b ° Undary mark E ' W " N - S s "own in 
rig. wb. An address designation value ADD-DEG 

TrnTT 3 m ° de USi " 9 the address transmitted 
from the controller 10 as an absolute address re- 
gardless of the present window number. When this 

r D °D DF^ ^r**' addr6SS desi 9na«on value 
io to. ' S transmitted to the real address gen- 

io ergon circuit 120 through the selectors S80 and 

rM ,l Ure 20 is a dete -'ed block diagram of the 
real address generation circuit. In Fig. 20. R100 to 

ib OR Jnl ° te ' e ° isters - S10 ° to 103 selectors, and 
OR denotes OR circuits. The input signals to this 
circuit are the relative address of the virtual PE 
transmuted from the controller 10, the upper ad- 

zzrsS*^ ,r ° m the address «*- 

20 from th 6 adjaC6nt Window numbers '"put 

PE bl C ' rCUit 5 °°- The real address to the 

TV ^TIT '! S t ° Wn COntr °' 9r ° UP iS *™>"** 
R10 3 ) t L at ' Ve addr6SS Set in the re 9ister 
R 00 to the upper address bits set in the register 
R101 as shown ,n Fig. 8B. As shown in Fig. 8B in 
25 J 8 U ? par e '9 ht bits, when the base address and 
the relative address overlap, one side is set to "0" 
The real address is obtained by the logic OR 
r^tT' thiS CaSe ' the ' 0wer ei 9"t bits' of the 

3 o 2 l :ZZ7o. the same bits as transmitted fr - 

a< i ia r U T er ' t0 96nerate tne real addre ss for the 
adjacent pseudo PE, the window number of the 

iSl Cent t r E iS S6t in the re 9 isters R1 °2 to R105. 
« 1, f ' o W ' nd0W numbers are controlled by the 
35 sectors S100 to S103 to be the address Jf he 

the se!l IdT*' ^ Whe " '° adin9 (L >- and to be 
tne self-address when saving (S). 

In this embodiment, although the multi-proces- 
sor ,s constituted by lattice coupling, it is possible 
- to constitute it by hyper-cubic coupling in acco 
dance with the application. 

Claims 



1- A parallel computer system using a SIMD 
method constituted by a controller and a plu- 
rality of processor elements, each of said pro- 

ZI°H? mentS havi " 9 3 Stora 9 e m *ans to 
store data to be processed, said controller 
controHing operation of said processor e ,t 
ments. and said parallel computer system per- 
forming processing of said data based on a 
calculation control signal transmitted from said 

prising " Para " e ' ^ 
a plurality of control group, each of said 
control groups being constituted by a number 
of processor elements divided from a plurality 
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of said processor elements, for use as an 
address control unit, 

a plurality of scheduling means, each of 
said scheduling means provided for one of 
said control groups and operatively connected s 
to said controller, and for receiving and man- 
aging an event signal designating an address 
signal for data to be processed and transmitted 
from an adjacent control group, and 

a plurality of real address generation 10 
means, each of said real address generation 
means provided for one of said control groups 
and connected between said controller, said 
scheduling means, and said control group, for 
generating an address signal for data to be 75 
processed by a processor element belonging 
to said control group based on a base address 
determined by said event signal to be man- 
aged by said scheduling means and an ad- 
dress signal applied from said controller. 20 

2. A parallel computer system as claimed in 
claim 1, wherein each of said control groups 
further comprises a plurality of pseudo proces- 
sor elements, each pseudo processor element 25 
being connected to each processor element 
located in a boundary of an adjacent control 
group, and for transmitting data at data ad- 
dress to be handled by said processor element 

to the processor element belonging to said 30 
adjacent control group. 

3. A parallel computer system as claimed in 
claim 1 or 2, wherein each said processor 
element and said pseudo processor element 35 
further comprise external memories to store 
read and write data, said write data being 
simultaneously written into said external mem- 
ory of said pseudo processor element when 

said processor element writes data into its own 40 
external memory. 

4. A parallel computer system as claimed in 
claim 3, wherein memory space of said exter- 
nal memory of said processor element is di- 45 
vided into a plurality of memory spaces (win- 
dows) of virtual processor elements. 

5. A parallel computer system as claimed in 
claim 4, wherein each of said windows com- 50 
prises a window number as a base address, 

said virtual processor element comprises a vir- 
tual processor element address as a relative 
address, and a real address of said processor 
element is obtained by adding said base ad- 55 
dress to said relative address or by performing 
a logical OR operation between said base ad- 
dress and said relative address. 
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